Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Typing for DataArray/Dataset #2929

Merged
merged 35 commits into from
Jun 25, 2019
Merged

Typing for DataArray/Dataset #2929

merged 35 commits into from
Jun 25, 2019

Conversation

crusaderky
Copy link
Contributor

@crusaderky crusaderky commented Apr 29, 2019

Status:

  • I'm generally not pleased with the amount of added verbosity. Happy to accept suggestions on how to improve.
  • Switching all variable names from str to Hashable. Without proper unit tests however (out of scope) non-string hashables are expected not to work most of the times. My preference would still be to stay limited on str...
  • DataArray done.
  • Dataset not done (except where it was hindering DataArray).
  • mypy passes with the only error "Mapping[...]" has no attribute "copy". This is due to the fact that I can't see a way to use typing.OrderedDict without breaking compatibility with python < 3.7.2.
  • py.test should be successful

@shoyer any early feedback is appreciated

@pep8speaks
Copy link

pep8speaks commented Apr 29, 2019

Hello @crusaderky! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-25 18:14:47 UTC

@crusaderky
Copy link
Contributor Author

One of the many reasons why I'm strongly against generic Hashables instead of strings is that I expect user code all over the world to have string-specific functions and methods applied to variable names and dimensions.
e.g.

[d for d in arr.dims if d.startswith('foo')]

The above will produce an error in mypy, and the user will be forced to either disable the test or verbosely convince mypy with a typing.cast that actually in his project he never uses non-string dims.

@max-sixty
Copy link
Collaborator

Thanks @crusaderky , this looks great!

Re:

Switching all variable names from str to Hashable. Without proper unit tests however (out of scope) non-string hashables are expected not work most of the times. My preference would still be to stay limited on str...

Curious on others' thoughts. I would vote to allow Hashable for data variable names in Datasets - similar to other Mappings - but less so for dimension names in arrays. (Not opposed to it per se, open to instances where that would be helpful)

Guido Imperiale added 4 commits May 8, 2019 02:48
@crusaderky
Copy link
Contributor Author

DataArray finished; unit tests (hopefully) successful. Will now move to Dataset.
Would appreciate an official, final decision from @shoyer et. al. on Hashable vs. str before I continue - also in light of all the hacks Hashable forced me to go through.

@shoyer
Copy link
Member

shoyer commented May 8, 2019

@crusaderky let's discuss that over in #2292. For now, I would try to defer making a decision on non-string dimension/variable names, even though that means we will have less informative type annotations.

Guido Imperiale and others added 6 commits May 9, 2019 10:33
@max-sixty
Copy link
Collaborator

@crusaderky I finished this off as best I could. Tests pass though needs a review. Here's the code: https://github.com/max-sixty/xarray/tree/annotations

(I tried pasting a patch, but I my git-fu isn't enough to exclude all the master code given there's a merge commit in there)

@max-sixty max-sixty mentioned this pull request Jun 15, 2019
@crusaderky
Copy link
Contributor Author

@max-sixty thanks for the work! I'll go through it ASAP

@crusaderky
Copy link
Contributor Author

@max-sixty I merged your branch in and did some tweaks to the OrderedDict's.
Now moving to add more typing info to dataset.py...

- mapping {coord name: DataArray}
- mapping {coord name: Variable}
- mapping {coord name: (dimension name, array-like)}
- mapping {coord name: (tuple of dimension names, array-like)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@max-sixty
Copy link
Collaborator

Great, thanks @crusaderky ! Let me know if I can do anything to help!

@max-sixty
Copy link
Collaborator

Just fixed some more merge conflicts over at https://github.com/max-sixty/xarray/tree/annotations.

Thoughts on merging this now and continuing off master?

@dcherian
Copy link
Contributor

Thoughts on merging this now and continuing off master?

+1 else the merge conflicts will just pile up.

@crusaderky
Copy link
Contributor Author

Sorry I didn't have time last week - I'll try merging into the pr ASAP. I'm in favour of merging to mainline too

@max-sixty
Copy link
Collaborator

Cool, I think it should be a straight fast-forward merge from my branch to yours, and then an easy merge to master

xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataarray.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Outdated Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
xarray/core/dataset.py Show resolved Hide resolved
@max-sixty
Copy link
Collaborator

max-sixty commented Jun 25, 2019

Great, I've resolved those comments @shoyer , thanks: https://github.com/max-sixty/xarray/tree/annotations

@crusaderky we should be ready to go - maybe take a glance over to ensure you're happy.

I added a whatsnew with both of us - hope that's OK

@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

@max-sixty do you want to push those changes to @crusaderky's branch? As a maintainer for xarray, I think you should have push permissions.

@max-sixty
Copy link
Collaborator

Ah it works! Didn't know permissions transferred to forks.

@crusaderky hope that's OK, not wanting to step on your toes here...

attrs=None, encoding=None, indexes=None, fastpath=False):
def __init__(self, data: Any,
coords: Union[
Sequence[Tuple[Hashable, Any]],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not quite right.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, in the latest it's just Tuple, is that OK? (I think maybe I pushed an older one first and you looked over that)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tuple should be fine, but it looks like that commit isn't on this branch yet

data_vars: Optional[Mapping[Hashable, Union[
'DataArray', Variable,
Tuple[Hashable, Any],
Tuple[Tuple[Hashable, ...], Any]]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also can handle any tuples args that you can use like Variable(*args) (maybe we could define a VariableArgs alias of some sort?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I added a comment re VariableArgs, def would be good to move some of these long definitions into a single place (but if you're OK with leaving until the next PR let's do that)

@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

By the way, I think I can write a script based on pep8speaks that writes/updates GitHub comments from within from Travis or Azure Pipelines. We could use that for surfacing mypy results in pull requests. I will probably give that a try over the next few days....

@max-sixty
Copy link
Collaborator

By the way, I think I can write a script based on pep8speaks that writes/updates GitHub comments from within from Travis or Azure Pipelines. We could use that for surfacing mypy results in pull requests. I will probably give that a try over the next few days....

That would be v cool. I'm kinda surprised something like pep8speaks doesn't do this already tbh...

@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

By the way, I think I can write a script based on pep8speaks that writes/updates GitHub comments from within from Travis or Azure Pipelines. We could use that for surfacing mypy results in pull requests. I will probably give that a try over the next few days....

That would be v cool. I'm kinda surprised something like pep8speaks doesn't do this already tbh...

I was considering adding this into pep8speaks, but I think this fundamentally incompatible with pep8speak's design. It runs a handful of pre-specified linters, without actually installing/running user code. But something like mypy needs to actually have xarray and its dependencies installed to work.

@max-sixty
Copy link
Collaborator

I was considering adding this into pep8speaks, but I think this fundamentally incompatible with pep8speak's design. It runs a handful of pre-specified linters, without actually installing/running user code. But something like mypy needs to actually have xarray and its dependencies installed to work.

Yes, good point.

I haven't spent much time with the GitHub Checks, but potentially could be a good fit - one issue with the simpler "Update the comment" model is that it doesn't encode to GH whether it's safe to merge, so projects need to duplicate the test in CI

@max-sixty
Copy link
Collaborator

(AppVeyor fail looks unrelated)

@shoyer shoyer merged commit d3f6db9 into pydata:master Jun 25, 2019
@shoyer
Copy link
Member

shoyer commented Jun 25, 2019

Thanks @max-sixty and @crusaderky !

@max-sixty
Copy link
Collaborator

Thanks @crusaderky !

@crusaderky crusaderky changed the title WIP: typing for DataArray/Dataset Typing for DataArray/Dataset Jun 30, 2019
@crusaderky crusaderky deleted the annotations branch June 30, 2019 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants